{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "ukmlUQzkTeUj"
   },
   "source": [
    "# Data Exploration\n",
    "\n",
    "This lab is *optional*. It demonstrates advanced Pandas usage and in-depth data analysis.\n",
    "\n",
    "---\n",
    "\n",
    "Learning objectives:\n",
    "\n",
    "1. Learn useful patterns for exploring data before modeling\n",
    "2. Gain an understanding of the dataset and identify any data issues.\n",
    "\n",
    "The goal of this notebook is to explore our base tables before we began feature engineering and modeling. We will explore the price history of stock in the S&P 500.\n",
    "\n",
    "* Price history : Price history of stocks\n",
    "* S&P 500 : A list of all companies and symbols for companies in the S&P 500\n",
    "\n",
    "For our analysis, let's limit price history since 2000. In general, the further back historical data is used the lower it's predictive power can be."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "8DGQXErETeUs"
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "PROJECT = 'your-gcp-project' # Change to your project.\n",
    "BUCKET = PROJECT\n",
    "\n",
    "os.environ['PROJECT'] = PROJECT\n",
    "os.environ['BUCKET'] = BUCKET"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "4kbfOkqzTeUq"
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import seaborn as sns\n",
    "\n",
    "from matplotlib import pyplot as plt\n",
    "from google.cloud import bigquery\n",
    "from IPython.core.magic import register_cell_magic\n",
    "from IPython import get_ipython\n",
    "\n",
    "bq = bigquery.Client(project=PROJECT)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "H-dtMIs0pQ54"
   },
   "outputs": [],
   "source": [
    "# Allow you to easily have Python variables in SQL query.\n",
    "@register_cell_magic('with_globals')\n",
    "def with_globals(line, cell):\n",
    "    contents = cell.format(**globals())\n",
    "    if 'print' in line:\n",
    "        print(contents)\n",
    "    get_ipython().run_cell(contents)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Preparing the dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create the dataset in our project BiqQuery and import the stock data by running the following cells:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!bq mk stock_src"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "TABLE=price_history\n",
    "SCHEMA=symbol:STRING,Date:DATE,Open:FLOAT,Close:FLOAT\n",
    "                \n",
    "gcloud storage buckets create gs://$BUCKET\n",    "test -f $TABLE.csv || unzip ../stock_src/$TABLE.csv.zip\n",
    "gcloud storage cp $TABLE.csv gs://$BUCKET/stock_src/$TABLE.csv\n",    "bq load --source_format=CSV --skip_leading_rows=1 \\\n",
    "    stock_src.$TABLE gs://$BUCKET/stock_src/$TABLE.csv  $SCHEMA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "TABLE=eps\n",
    "SCHEMA=date:DATE,company:STRING,symbol:STRING,surprise:STRING,reported_EPS:FLOAT,consensus_EPS:FLOAT\n",
    "                \n",
    "test -f $TABLE.csv || unzip ../stock_src/$TABLE.csv.zip\n",
    "gcloud storage cp $TABLE.csv gs://$BUCKET/stock_src/$TABLE.csv\n",    "bq load --source_format=CSV --skip_leading_rows=1 \\\n",
    "    stock_src.$TABLE gs://$BUCKET/stock_src/$TABLE.csv  $SCHEMA"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%bash\n",
    "\n",
    "TABLE=snp500\n",
    "SCHEMA=company:STRING,symbol:STRING,industry:STRING\n",
    "                \n",
    "test -f $TABLE.csv || unzip ../stock_src/$TABLE.csv.zip\n",
    "gcloud storage cp $TABLE.csv gs://$BUCKET/stock_src/$TABLE.csv\n",    "bq load --source_format=CSV --skip_leading_rows=1 \\\n",
    "    stock_src.$TABLE gs://$BUCKET/stock_src/$TABLE.csv  $SCHEMA"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "7Ti4ZYgApQ55"
   },
   "source": [
    "Let's look at the tables and columns we have for analysis. Please query the `INFORMATION_SCHEMA`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Learning objective 1.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 254
    },
    "colab_type": "code",
    "collapsed": true,
    "id": "JL5JvIhVpQ56",
    "jupyter": {
     "outputs_hidden": true
    },
    "outputId": "cb7c287b-a752-4f68-bfcd-6b16ec04a3ea"
   },
   "outputs": [],
   "source": [
    "%%with_globals\n",
    "%%bigquery --project {PROJECT}\n",
    "SELECT table_name, column_name, data_type\n",
    "FROM `stock_src.INFORMATION_SCHEMA.COLUMNS`\n",
    "ORDER BY table_name, ordinal_position"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "nhrUS9n3TeUx"
   },
   "source": [
    "## Price History"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "PCa4ezUBTeUy"
   },
   "source": [
    "**TODO**: Visualize stock symbols from the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "%%with_globals\n",
    "%%bigquery --project {PROJECT}\n",
    "SELECT *\n",
    "FROM `stock_src.price_history`\n",
    "LIMIT 10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def query_stock(symbol):\n",
    "    return bq.query('''\n",
    "    # TODO: query a specific stock\n",
    "    '''.format(symbol)).to_dataframe()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "collapsed": true,
    "id": "5NJ6tHieTeU0",
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "df_stock = query_stock('GOOG')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 285
    },
    "colab_type": "code",
    "collapsed": true,
    "id": "lPTw-7DNTeU9",
    "jupyter": {
     "outputs_hidden": true
    },
    "outputId": "f52dd841-11d3-4d4b-fb1e-7e8e22b89c63"
   },
   "outputs": [],
   "source": [
    "df_stock.Date = pd.to_datetime(df_stock.Date)\n",
    "ax = df_stock.plot(x='Date', y='Close', title='price')\n",
    "\n",
    "# Add smoothed plot.\n",
    "df_stock['Close_smoothed'] = df_stock.Close.rolling(100, center=True).mean()\n",
    "df_stock.plot(x='Date', y='Close_smoothed', ax=ax);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Tu6Ew6KHTeVG"
   },
   "source": [
    "**TODO 2**: Compare individual stocks to the S&P 500."
   ]
  },
 {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 291
    },
    "colab_type": "code",
    "id": "WYK6bWrOTeVQ",
    "outputId": "a6002845-8f61-476f-a88c-52c96ad454cd"
   },
   "outputs": [],
   "source": [
    "df_sp = query_stock('gspc')\n",
    "\n",
    "\n",
    "def plot_with_sp(symbol):\n",
    "# TODO: visualize S&P 500 price"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_with_sp('GOOG')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Learning objective 2**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_with_sp('IBM')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "m5nB8CfFpQ6J"
   },
   "source": [
    "Let's see how the price of stocks change over time on a yearly basis. Using the `LAG` function we can compute the change in stock price year-over-year. \n",
    "\n",
    "Let's compute average close difference for each year. This line could, of course, be done in Pandas. Often times it's useful to use some combination of BigQuery and Pandas for exploration analysis. In general, it's most effective to let BigQuery do the heavy-duty processing and then use Pandas for smaller data and visualization."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Learning objective 1, 2**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "collapsed": true,
    "id": "jhpq3oBbpQ6J",
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "%%with_globals\n",
    "%%bigquery df --project {PROJECT}\n",
    "WITH \n",
    "with_year AS\n",
    "    (\n",
    "    SELECT symbol, \n",
    "    EXTRACT(YEAR FROM date) AS year,\n",
    "    close\n",
    "    FROM `stock_src.price_history`\n",
    "    WHERE symbol in (SELECT symbol FROM `stock_src.snp500`)\n",
    "    ),\n",
    "year_aggregated AS\n",
    "    (\n",
    "    SELECT year, symbol, AVG(close) as avg_close\n",
    "    FROM with_year\n",
    "    WHERE year >= 2000\n",
    "    GROUP BY year, symbol\n",
    "    )\n",
    "SELECT year, symbol, avg_close as close,\n",
    "(LAG( \n",
    "    --# TODO: compute a year lag on avg_close\n",
    ")) \n",
    "    AS next_yr_close\n",
    "FROM year_aggregated\n",
    "ORDER BY symbol, year"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "kxplAbUOpQ6L"
   },
   "source": [
    "Compute the year-over-year percentage increase."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "collapsed": true,
    "id": "glDZFdaDpQ6L",
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "df.dropna(inplace=True)\n",
    "df['percent_increase'] = (df.next_yr_close - df.close) / df.close"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "Bc47qRjTUDas"
   },
   "source": [
    "Let's visualize some yearly stock "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 1000
    },
    "colab_type": "code",
    "collapsed": true,
    "id": "hu_BroHsRiK1",
    "jupyter": {
     "outputs_hidden": true
    },
    "outputId": "b1c2a5bd-9778-467d-b329-e296207bbdec"
   },
   "outputs": [],
   "source": [
    "def get_random_stocks(n=5):\n",
    "    random_stocks = df.symbol.sample(n=n, random_state=3)\n",
    "    rand = df.merge(random_stocks)\n",
    "    return rand[['year', 'symbol', 'percent_increase']]\n",
    "\n",
    "\n",
    "rand = get_random_stocks()\n",
    "for symbol, _df in rand.groupby('symbol'):\n",
    "    plt.figure()\n",
    "    sns.barplot(x='year', y=\"percent_increase\", data=_df)\n",
    "    plt.title(symbol)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "izV3epScpQ6N"
   },
   "source": [
    "There have been some major fluctations in individual stocks. For example, there were major drops during the early 2000's for tech companies."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 195
    },
    "colab_type": "code",
    "collapsed": true,
    "id": "i8iswmRRpQ6N",
    "jupyter": {
     "outputs_hidden": true
    },
    "outputId": "e2d4feaf-fd13-40db-a1bf-8d3b78773990"
   },
   "outputs": [],
   "source": [
    "df.sort_values('percent_increase').head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "collapsed": true,
    "id": "Ly5r3Hd5pQ6P",
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "stock_symbol = 'YHOO'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "collapsed": true,
    "id": "-VIBqHOzpQ6Q",
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "%%with_globals\n",
    "%%bigquery df --project {PROJECT}\n",
    "SELECT date, close\n",
    "FROM `stock_src.price_history`\n",
    "WHERE symbol='{stock_symbol}'\n",
    "ORDER BY date"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 283
    },
    "colab_type": "code",
    "collapsed": true,
    "id": "kJhiZu3BpQ6S",
    "jupyter": {
     "outputs_hidden": true
    },
    "outputId": "95261854-6098-4e36-9686-72d37b3e6399"
   },
   "outputs": [],
   "source": [
    "ax = df.plot(x='date', y='close')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "YgkVgETvpQ6U"
   },
   "source": [
    "**Stock splits** can also impact our data - causing a stock price to rapidly drop. In practice, we would need to clean all of our stock data to account for this. This would be a major effort! Fortunately, in the case of [IBM](https://www.fool.com/investing/2017/01/06/ibm-stock-split-will-2017-finally-be-the-year-shar.aspx), for example, all stock splits occurred before the year 2000."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Learning objective 2**\n",
    "\n",
    "**TODO**: Query the IBM stock history and to visualize how the stock splits affect our data. A stock split occurs when there is a sudden drop in price."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "collapsed": true,
    "id": "9ghCEZotpQ6V",
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "stock_symbol = 'IBM'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "collapsed": true,
    "id": "uEF_AVzZpQ6X",
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "%%with_globals\n",
    "%%bigquery df --project {PROJECT}\n",
    "SELECT date, close\n",
    "FROM `stock_src.price_history`\n",
    "WHERE symbol='{stock_symbol}'\n",
    "ORDER BY date"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 283
    },
    "colab_type": "code",
    "collapsed": true,
    "id": "ixoQkgPKpQ6b",
    "jupyter": {
     "outputs_hidden": true
    },
    "outputId": "4c7381c1-744a-4a9d-f182-4e09a08f439d"
   },
   "outputs": [],
   "source": [
    "# TODO: can you visualize when the major stock splits occured?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "NtIh4P4jTeVb"
   },
   "source": [
    "## S&P companies list"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "collapsed": true,
    "id": "6BsFvmURTeVe",
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "%%with_globals\n",
    "%%bigquery df --project {PROJECT}\n",
    "SELECT *\n",
    "FROM `stock_src.snp500`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 269
    },
    "colab_type": "code",
    "collapsed": true,
    "id": "MCQZ9pi-TeV_",
    "jupyter": {
     "outputs_hidden": true
    },
    "outputId": "3bb1cbeb-e809-49ba-f5c9-a442cc764952"
   },
   "outputs": [],
   "source": [
    "df.industry.value_counts().plot(kind='barh');"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "cHfJHITppQ6j"
   },
   "source": [
    "We can join the price histories table with the S&P 500 table to compare industries:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Learning objective 1,2**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "collapsed": true,
    "id": "W-mtAHFVpQ6l",
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [],
   "source": [
    "%%with_globals\n",
    "%%bigquery df --project {PROJECT}\n",
    "WITH sp_prices AS\n",
    "    (\n",
    "    SELECT a.*, b.industry\n",
    "    FROM `stock_src.price_history` a \n",
    "        JOIN `stock_src.snp500` b\n",
    "    USING (symbol)\n",
    "    WHERE date >= \"2000-01-01\"\n",
    "    )\n",
    "SELECT Date, industry, AVG(close) as close\n",
    "FROM sp_prices\n",
    "GROUP BY Date, industry\n",
    "ORDER BY industry, Date"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 195
    },
    "colab_type": "code",
    "collapsed": true,
    "id": "hidHA05BpQ6n",
    "jupyter": {
     "outputs_hidden": true
    },
    "outputId": "6af1d2f3-14aa-43bf-85b6-b77383b34b84"
   },
   "outputs": [],
   "source": [
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "-Jn8RvYLpQ6s"
   },
   "source": [
    "Using pandas we can \"unstack\" our table so that each industry has it's own column. This will be useful for plotting."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 225
    },
    "colab_type": "code",
    "collapsed": true,
    "id": "j4wd_X1rpQ6s",
    "jupyter": {
     "outputs_hidden": true
    },
    "outputId": "09f2febf-322e-466c-9b17-38ab26fce08c"
   },
   "outputs": [],
   "source": [
    "# Pandas `unstack` to make each industry a column. Useful for plotting.\n",
    "df_ind = df.set_index(['industry', 'Date']).unstack(0).dropna()\n",
    "df_ind.columns = [c[1] for c in df_ind.columns]\n",
    "df_ind.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 546
    },
    "colab_type": "code",
    "collapsed": true,
    "id": "5GVU_jWHpQ6u",
    "jupyter": {
     "outputs_hidden": true
    },
    "outputId": "b8c61f59-ad88-4247-d3b6-6098f2c76388"
   },
   "outputs": [],
   "source": [
    "ax = df_ind.plot(figsize=(16, 8))\n",
    "# Move legend down.\n",
    "ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), shadow=True, ncol=2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "iQy7kVdbpQ6w"
   },
   "source": [
    "Let's scale each industry using min/max scaling. This will put all of the stocks on the same scale. Currently it can be hard to see the changes in stocks over time across industries."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Learning objective 1**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 529
    },
    "colab_type": "code",
    "collapsed": true,
    "id": "Bwm9R_vvpQ6w",
    "jupyter": {
     "outputs_hidden": true
    },
    "outputId": "c22e6ffe-6d97-48bf-9baa-be3b75414664"
   },
   "outputs": [],
   "source": [
    "def min_max_scale(df):\n",
    "    return (df - df.min()) / df.max()\n",
    "scaled = min_max_scale(df_ind)\n",
    "ax = scaled.plot(figsize=(16, 8))\n",
    "ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), shadow=True, ncol=2);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "vgEEq514pQ6y"
   },
   "source": [
    "We can also create a smoothed version of the plot above using a [rolling mean](https://en.wikipedia.org/wiki/Moving_average). This is a useful transformation to make when visualizing time-series data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "colab": {
     "base_uri": "https://localhost:8080/",
     "height": 546
    },
    "colab_type": "code",
    "collapsed": true,
    "id": "ayOWG0kMpQ6z",
    "jupyter": {
     "outputs_hidden": true
    },
    "outputId": "7a1a9abe-c626-4481-a092-2cf76f5bdb73"
   },
   "outputs": [],
   "source": [
    "SMOOTHING_WINDOW = 30  # Days.\n",
    "rolling = scaled.copy()\n",
    "for col in scaled.columns:\n",
    "    rolling[col] = scaled[col].rolling(SMOOTHING_WINDOW).mean()\n",
    "ax = rolling.plot(figsize=(16, 8))\n",
    "ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.05), shadow=True, ncol=2);"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "8XvgaI9YpQ62"
   },
   "source": [
    "Information technology had a large crash during the early 2000s and again in 2008/2009; along with all other stocks. After 2008, some industries were a bit slower to recover than other industries."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "hHF8RrPPQ35I"
   },
   "source": [
    "BONUS: In the next lab, we will want to predict the price of the stock in the future. What are some features that we can use to predict future price? Try visualizing some of these features."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "lIjYu8HgTeXV"
   },
   "source": [
    "Copyright 2019 Google Inc. Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License"
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "name": "1_data_exploration.ipynb",
   "provenance": []
  },
  "environment": {
   "name": "tf2-gpu.2-1.m68",
   "type": "gcloud",
   "uri": "gcr.io/deeplearning-platform-release/tf2-gpu.2-1:m68"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
